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ADAPTIVE SPEECH FILTER 

Field of the Invention 

This invention relates to a method and system for improving the estimates of the 
spectral components of an information signal, such as speech, from a signal containing 

5 both the information signal and noise. The method is particularly suited to 

implemenution on a digital signal processor. The invention also provides the basis for 
signal enhancement and improved detection of the presence of an information signal. 
Background of the Invention 

The spectral components of an information signal are used in a number of signal 

10 processing systems including channel vocoders for communication of speech, speech 

recognition systems and signal enhancement fihw^s. Since the inputs to these systems are 
often contaminated by noise there has been a great deal of interest in noise reduction 
techniques. 

The effect of uncorrelated noise is to add a random component to the power in 

1 5 each frequency band . 

Noise free spectral components are required for channel vocoders. In a vocoder 
the input signal is filtered into a number of different frequency bands and the signal from 
each band is rectified (squared) and smoothed (low pass filtered). The smoothing 
process tends to reduce the variance of the noise Such methods are disclosed in US. 

20 Patent No. 3,431,355 to Rothauser et al and U.S Patent No. 3,431,355 to Schroeder 
An alternative approach is disclosed in U.S, Patent No. 3,855,423 to Brendzel et al, in 
this approach the level of the noise in each band is estimated from successive minima of 
the energy in that band and the level of the signal is estimated from successive maxima. 
In U.S. Patent No. 4,000,369 to Paul et al, the noise levels are estimated in a similar 

25 fashion and subtracted from the input signals to obtain a better estimate of the speech 
signal in each band. This method reduces the mean value of the noise. 

Another application of spectral processing is for speech filtering. Weiss et al., in 
"Processing Speech Signals to Attenuate Interference", presented at the IEEE Symp. 
Speech Recognition, April 1974, disclose a spectral shaping technique. This technique 

30 uses frequency domain processing and describes two approaches - amplitude modulation 
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(which is equivalent to gain control) and amplitude cUpping (which is equivalent to a 
technique called spectral subtraction). Neither the noise estimate nor the speech estimate 
is updated so this filter is not adaptive. An output time waveform is obtained by 
recombining the spectral estimates with the origimd phases. 

An adaptive speech filter is disclosed in U.S. Patent No 4.185.168 to Graupe and 
Causey, which is tnchided by reference herein. Graupe and Causey describe a method for 
the adaptive filtering of a noisy speech signal based on the assumption that the noise has 
restively stationary statistics compared to the speech signal. 

In Graupe and Causey's method the input signal is divided imo a set of signals 
limited to different frequency bands. The signal to noise ratio for each signal is then 
estimated in accordance with the time-wise variations of its absolute value The gain of 
each signal is then comrolled according to an estimate of the signal to noise ratio (the 
gain typically being close to unity for high signal to noise ratio and less than unity for low 
signal to noise ratio). 

Graupe and Causey describe a particular method for estimating the noise power 
froin successive minima in the signals, and describe several methods for detennining the 
gain as a fimction of the estimated noise and signal powers. This is an alternative to the 
method described earlier in U.S. Patent No. 4,025.721 to Graupe and Causey, which 
detects the pauses between utterances in the input speech signal and updates estimates of 
the noise parameters during these pauses. In U.S. Patent No. 4.025,721. Graupe and 
Causey describe the use of Wiener and Kahnan fibers to reduce the noise. These filters 
can be implemented in the time domain or the frequency domain. 

Boll, in "Suppression of Acoustic Noise in Speech using Spectral Subtraction". 
IEEE Transactions on Acoustics. Speech and Signal Processing. Vol. ASSP-27, No 2, 
April. 1979, describes a computationally more efficient way of doing spectral subtraction. 

In the spectral subtraction technique, used by Paul, Weiss and Boll, a constant or 
slowly-varying estimate of the noise spectrum is subtracted. However, successive 
measurements of the noise power in each frequency bin vary rapidly and only the mean 
level of the noise is reduced by spectral subtraction. The residual noise will depend upon 
the variance of the noise power. This is true also of Weiss's spectral shaping technique 
where the spectral gains are constant. In Graupe's method the gain applied to each bin is 
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continuously varied so that both the variance and the mean level of the noise can be 
reduced. 

There are many schemes for determining the spectral gains. One scheme is 
described by Ephraim and Malah in "Speech enhancement using a minimum mean-square 
5 error short-time spectral amplitude estimator". IEEE Transactions on Acoustics, Speech 
and Signal Processing. Vol. ASSP-32. No. 6. Dec. 1984. This describes a technique for 
obtaining two estimates of the signal to noise ratio - one ftom the input signal and one 
from the output signal. It does not update the estimate of the noise level. The gain is a 
complicated mathematical function of these two estimates, so this method is not suiuble 
10 for direct implementation on a digital processor. 

In U.S. Patent No. 5,012.519 to Aldersburg et al the gain estimation technique of 
Ephraim and Malah is combined with the noise parameter estimation method disclosed in 
U.S. Patent No. 4,025.721 to Graupe and Causey to provide a folly adaptive system. 
The mathematical function of Ephraim and Malah is replaced with a two-dimensional 
15 lookup uble to determine the gains. However, since the estimates of the signal to noise 
ratio can vary over a very large range, this table requires a large amount of expensive 
processor memory. Aldersburg et al use a separate voice detection system on the input 
signal which requires significant additional processing. 

There is therefore a need for an efficient adaptive signal enhancement filter 
20 suitable for implementation on an inexpensive digital signal processor. 

There is also a need for a robust noise estimator which can cope with changes in 

the noise characteristics. 

There is also a need for an efficient signal detection system. 

25 Summary of the Invention 

This invention relates to an improved adaptive spectral estimator for improving 
the estimates of the spectral components in a signal containing both an information 
signal, such as speech or music, and noise The improvements relate to a noise power 
estimator and a computationally efficient gain calculation method. The adaptive spectral 
30 estimator is particularly suited to implementation using digital signal processing The 
estimator can be used to provide improved spectral estimates of the information signal 
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and can be combined with a speech or voice recognition system. A further object of the 
invention is to provide an accurate method for voice detection. 

Brief Description of the Drawings 

Figure 1 is a diagrammatic view of a system of the prior art. 
Figure 2 is a diagrammatic view of a system of the cuirent invention. 
Figure 3 is a diagrammatic view of a system for gain modification. 
Figure 4 is a diagrammatic view of a system for signal power estimation. 
Figure 5 is a diagrammatic view of a system for noise power estimation. 
Figure 6 is a diagrammatic view of an information signal detector. 
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DescripCioD of the Preferred Embodiment 

The method is a modified version of that described in U.S. Patent No. 4,185,168 
to Graupe and Causey which describes a method for the adaptive filtering of a noisy 
speech signal The method is based on the assumption that the noise has relatively 
stationary statistics compared to the speech signal. 

The input to the filter is usually a digital signal obtained by passing an analog 
signal, containing noise and the information signal, through Ugh- and low-pass fihers and 
then sampling the resulting signal at a sample rate of at least 8 kHz. The high pass filter 
is designed to remove low frequency noise which might adversely affect the dynamic 
range of the filter. The turnover frequency of the high pass filter is less then /Jow , 
where /./oh' is the lower Umit of the speech band in Hertz. The low pass filter is an 
anti-aliasing filter which has a turnover frequency of at least /_high , where /_high is 
the upper limit of the speech band in Hertz. The order of the low pass filter is 
25 determined by the sampling frequency and the need to prevent aUasing. 

The output signal is calculated by filtering the input signal using a frequency 
domain filter with real coefficients and may be a time series or a set of spectral estimates. 

If the output is a time series then it may be passed to a digital to analog converter 
(DAC) and an analog anti-imaging filter to produce an analog output signal or it may be 
30 used as an input to subsequent signal processing. 

The estimator of the spectral components comprises four basic steps 
1 Calculation of the spectrum of the input signal. 



■SDOCID: <WO. 



.96a4127A1.l_> 



wo 96/24127 



PCT/VS96/011S5 



2. Estimation of the signal and noise power in cadi frequency bin within the 
speech band (fjow-^fjiigh Hz). 

3. Calculation of the gains (coefficients) of the frequency domain filter for each 
frequency bin 

5 4. Calculation of the spectral estimates by multiplying each input spectral 

component by the corresponding gain. 
This is basically the method of Graupc and Causey which is summarized in Figure 
1 , Each of the processes is described in detail below. 

The spectral components of the input signal can be obtained by a variety of 

10 means, including band pass filtering and Fourier transformation. In one embodiment a 
discrete or fast Fourier transform is used to transform sequential blocks of N points of 
the input time series. A window function, such as a Manning window, can be applied, in 
which case an overlap of N/2 points can be used. A Discrete Fourier Transform (DFT) 
can be used at each frequency bin in the speech band or, altwnatively, a Fast Fourier 

1 5 Transform (FFT) can be used over the whole fi-equency band. The spectrum is stored for 
each frequency bin within the speech band. For some applications it is desirable to have 
unequally spaced frequencies - in these applications a Fast Fourier transform cannot be 
used and each component may have to be calculated independently. In one embodiment 
the input spectrum, X. is calculated as the Fourier transform of the input time series, 

20 namely 

X = Fourier transform { r, window Junction, N }. 
The power in the input spectrum is given by 

power » modulus squared {A}. 
Alternatively, a band pass filter may be used, in which case the power may be 
25 estimated by rectifying and smoothing the filter output. 

The system of Graupe and Causey is diown in Figure 1, 

The input signal, x, is passed to bank of band pass filters. One of these filters 1 is 
shown in Figure 1. This produces an input component X. The power of this component 
is measured at 2. 

30 The method requires that estimates are made of the signal power, signal and 

noise power, noise. The noise power is estimated in 3 with a time constant related to 
the time over which the noise can be considered stationary. The signal is estimated at 4. 
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From these estimates the Wiener filter gain, W, is calculated as the ratio of the power in 
the information signal to the total power. This is done at 5 in Figure 1 . For each 
frequency bin this is 

W = signal /( noise + ^gnal). ' 
In the method of Graupc and Causey the Wiener gain. W. is directly applied to 
the corresponding component of the input spectium. In the unmodified scheme the 
spectral components of the output are given by multiplying the input component by the 
gain at 6 in Figure 1. Theresuhis 

If the output time series, is required tt can be calcuhited by an inverse FFT (or 
DFT) and the 'overlap-add' method or by summing tiie components fi^om individual 
chanhds using channel summer 7 in Figure 1 .. 

After each iteration * the output block of time points is updated as 
yf^i:W = inverse Fourier transform {YJ^} 
yk<l:N/2) ^yk(l:N/2) + yk.j(N/2+l:N) 
The first N/2 points of y/^ are then sent to the DAC or may be used for fimher 
processing. 

An improved ^em of the current invention is shown in Figure 2. The additional 
features are described below. 

Gain Mbdificatioii 

When the signal to noise ratio is low tiie dirert use of the Wiener gain results in a 
residual noise which has a musical or artificial character. 

One improvement of the current invoition is the use of gain modifier, 8 in Figure 
2, which reduces the musical nature of the residual noise. The gain modifier, which is 
shown in Figure 3, will now be described. 

The instantaneous power of the information signal can be estimated as the 
product of the instantaneous power and the Wiener gain. This gives an estimate of the 
instantaneous signal to noise ratio, snr, in each frequency bin obtained by dividing the 
power by the noise at 10 in Figure 3, and using this to modulate or multiply the Wiener 
gain at 1 1 . Hence 

jwr = W * (power / noise). 
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A function of the signal to noise ratio is then calculated at 12. The modified filter 
gains (coefficients), which are denoted by the vector C are calculated by dividing this 
function of the signal to noise ratio by the ratio of the power to the noise at 13. This is 
done for each frequency, so that 



where F is a function of a single variable and is therefore well suited to 
implementation on a DSP as a look-up table or an analytic function. One form of the 
fiinction F is given by 



where c and swrO are constants. Other forms can used, but it is desirable that 
the fimction is approximately linear at high signal to noise ratios. In particular the gain of 
Ephraim and Malah may be manipulated so that it can be implemented in this form. 

The spectral output, Y, that is the estimate of the spectrum of the information 
signal, is calculated by multiplying the input spectral components by the corresponding 
modified gains 6 in Figure 2 , so that for each frequency 



Signal Estimation 

Ephraim and Malah describe a method for updating a signal to noise ratio This 
method can be modified to give an estimate of the signal power, signal. This signal 
estimator (4 in Figure 2) uses the power in the output signal calculated at 9 in Figure 2. 
The method is shown in detail in Figure 4 and is given by 

sigi = maximum{/7cwer - noise, 0\ 

sig2 ^ modulus squared{F} 

signal ^ (I-beta) ♦ sigj + beta * sig2 
The difference between the current total power and the estimate of the noise is 
calculated at 14. This signal is then half wave rectified at IS. The signal estimate is 
obtained as a weighted sum 16 of this rectified signal and the power in the output signal. 
The weighting parameter beta used in the weighted sum is typically chosen to be greater 
than 0.9 and less than 1 . 



C = F{snr} * (noise /power) ^ F{snr} / (power / noise) 




X , X < snrO 
x+c, x>snrO' 
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Noise EstimatioD 

The estimates of the noise can be updated during the pauses in the information 
signal. The pauses can be detected by looking at a weighted sum of the signal to noise 
components across finequency bins (a unifonn weighting may be used). If this weighted 
S sum is below a pred^ermined threshold, Smin say, the noise estimate at each frequency 
is updated as 

noise = noise + alpha ♦ maximum {^lower - noise, 0) 
where alpha is a parameter which determines the time constant of the estimate. 
a^ha is typically chosen to be greater than 0.9 and less than I . 
10 An alternative noise estimator may be obtained by using the assumption that the 

information signal and the noise signal are uncorrelated. The signal power can be 
estimated from the output components, Y, and subtracted from the total power 
oldjpQwer from the previous update. That is 

temp = alpha* (old _pawer[f] - signal) 
1 5 noise = (J -alpha) ♦ noise + 

alpha * sign { temp } ^minimum { dbs(tempXnoiS€/2 } 
This noise estimator is depicted in Figure 5. The difference between the total 
power and the signal power is calculated at 17, it is then multiplied by alpha at 18 The 
previous noise estimate is multiplied by (l-alpfui) at 19 and added in 20 to the output of 
20 multiplier 18. The two noise estimators described above differ from those previously 

used in that they make use of the signal estimate. Other forms of noise estimators can be 
used, including combinations of the above two methods. 

Information Signal Detector 
25 The presence of an information signal can be detected by looking at a weighted 

sum of the signal to noise componmts across frequency bins (a unifonn weighting may 
be used). If this weighted sum is above a predetermined threshold, the signal is assumed 
to contain infonnation. This is shown in Figure 6, the signal to noise ratios are weighted 
at 22 and then summed at 23 before being passed to the threshold detector 24. 

30 
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A Particular Embodiment 

One embodiment of the method is described below 

at: each update number Jc 
5 X = Fourier transform { x, window Junction, N }. 

FOR each frequency number f in speech band 
power == modulus squared { X[f] } 
sigl = maximum{/7cwer - noiseff], 0) 
sig2 = modulus squared {}7i(7) 
10 signal = (J -beta) * sigl + beta ♦ sigl 

W = signal /( noise[f] + signal) 
snr = If * ( power /noiself] ) 
C = F{snr} I ( power /noise [fj ) 
temp = alpha*(old _pcfwer[f] - sigrml) 
\ 5 noise = (I^lpha) * /loise + 

£7//7/fa ♦ sign{tei?ip}*minimum{abs(/^w/>),»oije/2} 
o/rf _power[f] = /xwer 

Yin-c^xm - 

ENDFOR 

20 yt^i 'J^) = inverse Fourier trcmsform {Y.N} 

yk(l:N/2) =yfc(l:N/2) ^ yk^i(N/2^ 1:N) 



At the end of each iteration, *, the signal yf^l:N/2) provides an estimate of the 
information signal. 
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Oaims 

1 A method for estimating the frequency componcms of an infonnation signal from 
an input signal containing both the infonnation signal and noise, said method 
comprising 

fihering the input signal tiirough a set of band pass filtere to produce a set 
of input frequency components, one for each frequency band, and for each 
frequent component. 

calculating the total power in each input frequency component. 

estimating the power of the infonnation signal included therein, 

calculating a gain for each frequency band as a function of the total 
power, the estimate of the power in the information signal and a previous 
estimate of the noise power, 

multiplying the input frequency component by said gain to thereby 
produce an estimate of the frequency component of said infonnation signal, 

estimating a new noise power estimate from the previous noise power 
estimate and tiie difference between ti»e total power in the input frequency 
component and the estimate of the frequency component of said information 
signal. 

A method is in claim 1 in which the gain in eadi fi^uency band is detennined by 
estimating a Wiener gain from said previous noise power estimate and the 

estimate of the power of the information signal. 

multiplying said Wioier gain by the ratio of the power of the input 

frequency component to the estimated noise power to produce an estimate of 

the signal to noise ratio, 

calculating a function of the estimated signal to noise ratio, 

dividing said funaion of the estimated signal to noise ratio by the ratio of 

the power of the input frequency component to the estimated noise power to 

thereby produce a modified gain. 
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3. A method as in claim 1 and including the step of estimating the overall signal to 
noise ratio from a weighted sum of the estimated signal to noise ratios in each 
frequency band. 

5 4. A method as in claim 3 and including the step of using said estimated overall 
signal to noise ratio to determine the presence of an information signal in the 
input signal. 

5 A method as in claim 3 and including the step of estimating the noise power from 
10 the total power in the input frequency component, a previous noise power 

estimate and the estimate of the overall signal to noise ratio; 

6. A method as in claim 1 andmcluding the step of recombining the estimates of the 
frequency components of said information signal to produce a noise reduced 

IS output signal. 

7. A method as in claim 1 in which the power of the information signal is estimated 
from a combination of the previous estimate of the frequency components of said 
information signal and the positive difference between the power in the input 

20 frequency component and the noise power estimate. 

8 A method as in claim 1 in which the filtering is performed via a Fourier transform 

9. A method as in claim 1 which is used as a preprocessor to a speech or voice 
25 recognition system. 

10 A method as in claim 6 which is used for reducing noise in a communications 
system. 

30 11. A system for estimating the frequency components of an information signal from 
an input signal containing both the information signal and noise, said system 
comprising 
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filter means for the input signal through a set of band pass filtere to 
produce a set of input frequency components, one for each frequency band, 
and for each frequency component. 

a first calculating means for calculating the total power in each input 
5 frequency conqwnoit, 

an estimating means for estimating the power of the information signal 
included therein. 

a second calculating means for calculating a gain for each frequency band 
as a function of the total power, the estimate of the power m the information 
' ^ signal and a previous estimate of the noise power, 

gain multiplying means for multiplying the input frequency component by 
said gain to thereby produce an estimate of the frequency component of said 
information signal whereby said estimating means estimates a new noise 
power estimate from the previous noise power estimate and the difiference 
' * between the total power in the input frequency component and the estimate of 

the frequency component of said information signal. 

12. A system as in claim 1 1 in which the second calculating means includes means for 
estimating a Wiraer gain from said previous noise power estimate and the 
20 estimate of the power of the information signal. 

Wdner multiplying means for multiplying said Wiener gain by the ratio of 
the power of the input frequency component to tfie estimated noise power to 
produce an estimate of the signal to noise ratio. 

function calculating means for calcuhtting a function of the estimated 
25 signal to noise ratio, and 

division means for dividing said function of the estimated signal to noise 
ratio by the ratio of the power of the input frequency component to the estimated 
noise power to thereby produce a modified gain. 
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